智能论文笔记

BLOOM+1: Adding Language Support to BLOOM for Zero-Shot Prompting

Zheng-Xin Yong , Hailey Schoelkopf , Niklas Muennighoff , Alham Fikri Aji , David Ifeoluwa Adelani , Khalid Almubarak , M Saiful Bari , Lintang Sutawika , Jungo Kasai , Ahmed Baruwa

分类：自然语言处理 | 人工智能 | 机器学习

2022-12-19

The BLOOM model is a large open-source multilingual language model capable of zero-shot learning, but its pretraining was limited to 46 languages. To improve its zero-shot performance on unseen languages, it is desirable to adapt BLOOM, but previous works have only explored adapting small language models. In this work, we apply existing language adaptation strategies to BLOOM and benchmark its zero-shot prompting performance on eight new languages. We find language adaptation to be effective at improving zero-shot performance in new languages. Surprisingly, adapter-based finetuning is more effective than continued pretraining for large models. In addition, we discover that prompting performance is not significantly affected by language specifics, such as the writing system. It is primarily determined by the size of the language adaptation data. We also add new languages to BLOOMZ, which is a multitask finetuned version of BLOOM capable of following task instructions zero-shot. We find including a new language in the multitask fine-tuning mixture to be the most effective method to teach BLOOMZ a new language. We conclude that with sufficient training data language adaptation can generalize well to diverse languages. Our code is available at \url{https://github.com/bigscience-workshop/multilingual-modeling/}.

translated by 谷歌翻译

Large language models (LLMs) can acquire strong code-generation capabilities through few-shot learning. In contrast, supervised fine-tuning is still needed for smaller models to achieve good performance. Such fine-tuning demands a large number of task-specific NL-code pairs, which are expensive to obtain. In this paper, we attempt to transfer the code generation ability of an LLM to a smaller model with the aid of weakly-supervised data. More specifically, we propose explicit knowledge transfer (EKT), which uses the few-shot capabilities of a teacher LLM to create NL-code pairs that we then filter for correctness and fine-tune the student on. We evaluate EKT on the task of generating code solutions to math word problems from the GSM8k dataset. We find that EKT not only yields better performance than training with expert iteration, but also outperforms knowledge distillation, another form of knowledge transfer. A GPT-Neo 1.3B model trained using EKT with a GPT-J teacher achieves a 12.4% pass@100 on GSM8k, while the same student and teacher trained with knowledge distillation yield only a 3.7% pass@100. We also show that it is possible for a student model to outperform the teacher using EKT.

translated by 谷歌翻译

Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.

translated by 谷歌翻译

我们介绍了一项对自然语言（NL）推理的人类通知，开放域和逻辑上复杂且多样的数据集，配备了一阶逻辑（fol）注释。对开本由1,435个示例（独特的结论）组成，每个示例与487组前提之一搭配，这些场所作为规则，可用于演绎理由，以理解每个结论的有效性。前提和结论的逻辑正确性是通过其平行注释来确保的，这些注释会自动由我们的FOL推理引擎验证。除了主要的NL推理任务外，对开本中的NL-FOL对自动构成了使用FOL作为逻辑形式的新的NL-FOL翻译数据集。我们对广泛的实验系统地评估了对中型语言模型（BERT，ROBERTA）进行微调的FOL推理能力，并且在大型语言模型（GPT-NEOX，OPT，OPT，GPT-3，Codex）上促成了很少的射击。对于NL-FOL翻译，我们尝试使用GPT-3和Codex。我们的结果表明，公开可用的最强大的大语言模型之一（LLM），GPT-3 Davinci，仅比随机结果略好，而在一部分集的一部分中，该模型尤其不好，并且在预测该模型方面尤其不好。纠正虚假和未知结论的真实价值。我们的数据集和代码可在https://github.com/yale-lily/folio上找到。

translated by 谷歌翻译